文本分类在许多真实世界的情况下可能很有用,为最终用户节省了很多时间。但是,构建自定义分类器通常需要编码技能和ML知识,这对许多潜在用户构成了重大障碍。为了提高此障碍,我们介绍了标签侦探,这是一种免费的开源系统,用于标记和创建文本分类器。该系统对于(a)是一个无代码系统是独一无二的分类器在几个小时内,(c)开发用于开发人员进行配置和扩展。通过开放采购标签侦探,我们希望建立一个用户和开发人员社区,以扩大NLP模型的利用率。
translated by 谷歌翻译
终止分析研究了打算检测非终止的程序的终止行为,这众所周知会导致各种程序错误(例如悬挂程序,拒绝服务漏洞)。除了正式的方法外,已经进行了各种尝试来估计使用神经网络的程序的终止行为。但是,这些方法中的大多数继续依靠形式方法来提供强大的健全性保证,因此受到了类似的限制。在本文中,我们摆脱了形式的方法,拥抱机器学习模型的随机性质。我们的目标不是要通过解决方案来解释的严格保证,而是提供估计程序的终止行为,以及程序员可以将其用于调试目的的可能性不终止的可能原因(如果适用)。与以前使用神经网络进行程序终止的方法相比,我们还通过采用图形神经网络来利用程序的图表表示。为了进一步协助程序员理解和调试非终止错误,我们适应了以前用于其他应用程序域的注意力和语义细分的概念。总体而言,我们设计和实现了基于图形卷积网络和图形注意网络的程序终止的分类器,以及语义分割图形神经网络,该神经网络本地定位可能导致非终止的AST节点。我们还说明了如何将语义细分提供的信息与程序切片结合在一起,以进一步帮助调试。
translated by 谷歌翻译
尽管可解释的AI的大量研究重点是产生有效的解释,但较少的工作致力于人们如何理解和解释解释的问题。在这项工作中,我们通过研究基于显着性数据的解释来关注这个问题。文本模型的特征属性解释旨在传达输入文本的哪些部分比其他部分更具影响力。许多当前的解释方法,例如基于梯度或基于沙普利价值的方法,都提供了重要的衡量标准,这些方法在数学上是众所周知的。但是,一个人接受解释(解释)如何理解它?他们的理解是否与解释试图交流的内容相匹配?我们从经验上研究了输入的各种因素,特征 - 贡献解释和可视化程序对Laypeople对解释的解释的影响。我们询问人群工人对英语和德语的任务进行解释,并根据感兴趣的因素适合他们的回答。我们发现人们经常误解解释:尽管有直接传达重要性的解释,但肤浅和无关的因素(例如单词长度)影响了解释者的重要性分配。然后,我们证明其中一些失真可以减弱:我们提出了一种基于过度感受和低估的模型估计的方法来调整销售的方法,并探索条形图作为热图显着性可视化的替代方法。我们发现两种方法都可以减轻特定因素的扭曲作用,从而使对解释的理解更好地理解。
translated by 谷歌翻译
当向人类解释AI行为时,人类的解释如何理解传达的信息,并且它是否与解释试图交流的内容相匹配?我们什么时候可以说解释正在解释某件事?我们旨在通过利用有关人类用来理解行为的民间概念的思维理论来提供答案。我们建立了人类言论的社会归因框架,该框架描述了解释的功能:人类从他们那里理解的信息。具体而言,有效的解释应产生连贯的心理模型(传达有关其他对比案例的信息),完整(传达对对比案例的明确因果叙事,代表原因,影响的表示和外部原因)以及互动(表面和解决矛盾,通过审讯到概括属性)。我们证明,许多XAI机制可以映射到民间行为概念。这使我们能够发现它们的故障模式,以防止当前方法有效解释,以及启用连贯解释所必需的。
translated by 谷歌翻译
现代组织病理学图像分析依赖于细胞结构的分割,以导出生物医学研究和临床诊断所需的定量度量。最先进的深度学习方法主要在分割中施加卷积层,通常以特定的实验配置高度定制;通常无法概括到未知数据。随着经典卷积层的模型容量受到有限的学习内核的限制,我们的方法使用图像的图形表示,并在多个放大率上侧重于节点转换。我们提出了一种新的细胞核的语义分段建筑,对实验配置的差异如染色和细胞类型的变异。该结构包括基于残留图注意层的新型神经塑形图注意网络,并同时优化表示组织病理学图像的多倍倍率水平的图形结构。通过投影生成节点特征的图形结构的修改对于架构是图形神经网络本身的重要性。它确定可能的消息流和关键属性以优化关注,图形结构和节点更新在平衡放大率损耗中。在实验评估中,我们的框架优于最先进的神经网络的整合,通常需要一部分神经元,并为新的核数据集进行分割的新标准。
translated by 谷歌翻译
Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of such events, or corresponding financial figures, our approach requires no such additional data, and can be employed to extract economic events related to companies which are not even mentioned in the training data.
translated by 谷歌翻译
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course (MOOC). The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book exam in the course.
translated by 谷歌翻译
For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.
translated by 谷歌翻译
Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention resolution. Moreover, we use blockwise causal attention during inference for better resolution. We evaluate different Transformer variants with language modeling. Experimental results show that our model achieves strong performance in both interpolation and extrapolation settings. The code will be available at https://aka.ms/LeX-Transformer.
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译